News | Språkbanken Text

Ny korpus speglar det svenska ordförrådet under 1900-talet

30 November 2023

SAOB1950 är en korpus bestående av inscannade böcker från 1950 till 2007. Texterna används som källmaterial för att uppdatera SAOB, Svenska Akademiens ordbok, med ett urval som speglar det svenska ordförrådet framför allt under 1900-talets senare hälft.

Nu finns korpusen i Korp, både i det moderna läget, och i ett eget SAOB-läge där SAOB-redaktionen gjort ett eget korpusurval.

Korpusen finns även att ladda ner som omkastade meningsmängder från Språkbanken Texts sida med språkliga data.

Ordlista

korpus: en stor samling språkliga data

CALD-pseudo workshop på EACL 2024

24 November 2023

Välkommen att skicka in bidrag till CALD-pseudo workshop om datorbaserade metoder för pseudonymisering av språkdata. Workshopen är en del av konferensen för den europeiska avdelningen av Association for Computational Linguistics (EACL) som äger rum 21-22 mars 2024 på Malta.

Tillgången till forskningsdata är kritisk inom flera forskningsdomäner, men personligt innehåll hindrar ofta data från att vidareanvändas. Dataskyddsförordningen, GDPR (EU-kommissionen, 2016), föreslår pseudonymisering som en lösning för att säkra öppen tillgång till forskningsdata. Den största utmaningen är hur man effektivt pseudonymiserar data så att individer inte kan identifieras, samtidigt som man behåller data som är användbar för forskning inom bland annat datalingvistik, lingvistik och naturlig språkbehandling.

Under workshopen diskuteras flera utmaningar inom pseudonymisering.

Läs mer på workshopens webbsida >>

RaPID-5@LREC-COLING2024 - Full day event in May 2024 in Turin, Italy

21 November 2023

The 5th RaPID Workshop (RaPID-5) is an interdisciplinary forum for researchers to share information, findings, methods, models and experience of the collection and processing of data produced by individuals with various forms of mental, cognitive, neuropsychiatric or neurodegenerative disabilities, such as aphasia, dementia, autism, Parkinson's disease or schizophrenia. RaPID-5 will be open for contributions very soon.

RaPID-5@LREC-COLING2024: Resources and ProcessIng of linguistic, para-linguistic and extralinguistic Data from people with various forms of cognitive/psychiatric/developmental

Full day event: May 2024 (exact date TBA)
Location: Lingotto Conference Centre - Turin, Italy
More information: https://spraakbanken.gu.se/en/rapid-2024

The 5th RaPID Workshop (RaPID-5) is an interdisciplinary forum for researchers to share information, findings, methods, models and experience of the collection and processing of data produced by individuals with various forms of mental, cognitive, neuropsychiatric or neurodegenerative disabilities, such as aphasia, dementia, autism, Parkinson's disease or schizophrenia. Data includes spontaneous [continuous] speech and transcriptions, eye movement measurements, and various types of digital and multimodal biomarkers such as sensor data from mobile phones, smart watches, wearable devices, and the like.

A particular interest with RaPID-5 is studies on the relationship between different linguistic, paralinguistic and extralinguistic observations that can aid the identification, extraction, correlation, evaluation and modelling of different linguistic and/or multimodal phenotypes and measurements, which can be used to facilitate diagnosis, monitor development or predict individuals at risk of developing neurodegenerative or neuropsychiatric diseases.

RaPID-5 particularly welcomes contributions on multidisciplinary aspects of processing data from the aforementioned populations, and with a focus on the interaction between clinical/medical science/informatics, language technology, and computer science.

Höstworkshop i repris: Strix

20 November 2023

Strix är en textforskningsplattform som gör det möjligt att analysera hela texter och dokument. På årets Höstworkshop berättade Yousuf Ali Mohammed på Språkbanken Text om fördelarna med Strix.

Läs nyheten på Nationella språkbankens hemsida >>

Meet our new PhD student Maria Irena Szawerna

13 November 2023

Research data often contains both personal and sensitive information, which can be a problem if you want to share the data. Our newest PhD student Maria Irena Szawerna will help with this problem by focusing on pseudonymization, especially pseudonym generation.

Born and raised in Wrocław, Poland, she took her Master’s degree in linguistics in Heidelberg, Germany. As she started dating a Swede she planned a move to Sweden. At the same time, she started to look for something more practical to do with her linguistic knowledge.

– My friends from college became copywriters, translators and teachers. I started thinking about doing computational linguistics.

So Maria got accepted into the Master in Language Technology programme at the University of Gothenburg.

– I enjoy the academic stuff, it is a kind of a family tradition. Many of my family members were teachers or worked in academia so it is familiar to me.

Having graduated she started to look for work and was made aware of a PhD position at Språkbanken Text. It fitted what she had worked on before: corpus linguistics. She is working with Elena Volodina and her project Mormor Karl. One goal is to create algorithms for automatic pseudonimzation of research data. This has the benefit of increasing the accessability of data that contains sensitive information.

– Hopefully my work will give students an easier situation working with contemporary data than I had.

In her spare time Maria likes to play games, everything from computer to roleplaying games. She also enjoys going out to take pictures of Swedish wildlife.

– Now we play the new Swedish edition of Drakar & Demoner. It trains my Swedish, even though I mostly get better at the names of medieval arms!

The Eurasian oystercatcher (strandskata / Haematopus ostralegus, picture taken in Uddevalla) — The Eurasian oystercatcher (strandskata / Haematopus ostralegus, picture taken in Uddevalla by Maria Irena Szawerna)